A company is migrating a legacy application to an Amazon S3 based data lake. A data engineer reviewed data that is associated with the legacy application. The data engineer found that the legacy data contained some duplicate information. The data engineer must identify and remove duplicate information from the legacy application data. Which solution will meet these requirements with the LEAST operational overhead?
[ ] A. Write a custom extract, transform, and load (ETL) job in Python. Use the DataFrame.drop_duplicates() function by importing the Pandas library to perform data deduplication.
[x] B. Write an AWS Glue extract, transform, and load (ETL) job. Use the FindMatches machine learning (ML) transform to transform the data to perform data deduplication.
[ ] C. Write a custom extract, transform, and load (ETL) job in Python. Import the Python dedupe library. Use the dedupe library to perform data deduplication.
[ ] D. Write an AWS Glue extract, transform, and load (ETL) job. Import the Python dedupe library. Use the dedupe library to perform data deduplication.
A company is building an analytics solution. The solution uses Amazon S3 for data lake storage and Amazon Redshift for a data warehouse. The company wants to use Amazon Redshift Spectrum to query the data that is in Amazon S3. Which actions will provide the FASTEST queries? (Choose two.)
[ ] A. Use gzip compression to compress individual files to sizes that are between 1 GB and 5 GB.
[x] B. Use a columnar storage file format.
[x] C. Partition the data based on the most common query predicates.
[ ] D. Split the data into files that are less than 10 KB.
A company uses Amazon RDS to store transactional data. The company runs an RDS DB instance in a private subnet. A developer wrote an AWS Lambda function with default settings to insert, update, or delete data in the DB instance. The developer needs to give the Lambda function the ability to connect to the DB instance privately without using the public internet. Which combination of steps will meet this requirement with the LEAST operational overhead? (Choose two.)
[ ] A. Turn on the public access setting for the DB instance.
[ ] B. Update the security group of the DB instance to allow only Lambda function invocations on the database port.
[x] C. Configure the Lambda function to run in the same subnet that the DB instance uses.
[x] D. Attach the same security group to the Lambda function and the DB instance. Include a self-referencing rule that allows access through the database port.
[ ] E. Update the network ACL of the private subnet to include a self-referencing rule that allows access through the database port.